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THE SYSTEMATIC LOCATION OF GENES BY 
MEANS OF CEOSSOVEE OBSEEVATIONS 

E. A. FISHER 

R-OTHAMSTED EXPERIMENTAL STATION 

Introductoby 

In the construction of a chromosome map, the dis- 
tances between neighboring genes are equated to the per- 
centage of crossovers which have been observed between 
them. Owing to errors of random sampling, and some- 
times to other disturbing causes, inconsistencies always 
arise between the distances so determined. For example, 
in the important data given by Lancefield and Metz for 
the sex chromosome of Drosophila willistoni [1, p. 241] 
we have the following values : 





TABLE I 








Crossover 
Percentage 


Number of 
Observations 


Number of 
Crossovers 




1.43 
2.42 
7.09 


279 

455 

6388 


4 




11 




453 







Within such a small range, double crossing over may 
be ignored ; yet it would be wrong to use such inconsist- 
encies as an argument against the linear arrangement of 
the genes. For although the true crossover values may 
be accurately additive, errors of random sampling will 
certainly disturb the observed percentages. The practi- 
cal problem is to assign to the distances between the 
genes values which shall be as far as possible in accord 
with the whole of the observations available. In other 
words, we have to make use of as much as practicable, 
ideally the whole, of the information supplied by the 
data; giving due weight (i) to the greater accuracy of 
the values obtained from the larger number of observa- 
tions, (ii) to the greater accuracy of values obtained from 
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8 - 






6 





2. Mathematical Theory 

In the above example, if we~l 
write 2h and p 2 for the two adjacent 
crossover ratios, the probability of 
the actual series of observations 



Rimmed 



• Beformed 



Triple 



Rough 



closer pairs. In general, too, we shall have to consider 
not three genes only, but a large number, lying sufficiently 
close together for double crossing over to be ignored, 
the percentage observed between 
each pair of which gives indirect 
information as to the position of 
all the others. 

In its general character the prob- 
lem resembles those problems in- 
volving* errors of observation, where 
a smaller number of unknowns are 
determined from a larger number 
of inconsistent equations, and which 
are usually solved by the method 
of least squares. The practical so- 
lution depends on the construction 
of a number of " normal equations " 
for the unknowns, in which the in- 
consistencies of the data are prop- 
erly weighted and made to balance. 
To make the sum of the squares of 
the errors of the crossover percent- 
ages a minimum would, however, 
be wrong, and the method of least 
squares is not directly applicable. 
It has been shown that the whole of 
the information supplied by the 
data (2) is made use of by the 
method of maximum likelihood, 
and by a first approximation the 
required normal equations may be 
constructed. 



Beaded 



Peach 



Scute 



Reduced 



H. 
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will be proportional to 

Pl 4 (l - Pl) m P* xl 0- ~ pi) ui (Vi + Pi) m (l - Pi - J)2) 6936 

and the likelihood of any given pair of values for p x and 
p 2 will be proportional to the same quantity. In order 
to make this quantity a maximum for variations of p x 
and p 2 , we have the equations 

4 275 453 5935 



Pi 1 — Pl Pl + P2 1 — Pi — P2 

453 5935 11 _ 444 



Pl + P2 1 — Pl — P2 Pi 

These equations are exact, but for practical purposes 
we need equations linear in p t and p 2 , and a first approxi- 
mation is sufficient; if p differs little from x/(x + y) = 
x/n, then 

x y . „ /.-r y \ / x\ , n 3 , n? 
- - ~— = - ( - + - - - ) ( P --) + ... = P + -. 

p 1 — p \p- (1 - p)V V nj xy y 

So that we may rewrite equations (1) in the practical 
and approximate form 

279 3 6388 s , s 279 2 , 63882 

■Pi + a no ^r^ ^(Pi+P2) =s=f +; 



4 X 275 * 453 X 5935 ^ ' ' 275 5935 
6388 3 , , , 4553 638 82 4552 

: (Pl + P.) + TTTTTTa P2 = — — + 



453 X5935" ' 11 X 444 " 5935 444 

For each percentage observation, therefore, we have 
merely to calculate the two quantities n s /xy and n 2 /y; 
then normal equations may be constructed in the form 

a p A-a p 4- . . ■ = & 
11*1 ~ 12^2 ' 1 

a p Ma p M . • • ■=zt 

12^1 ~ 22 K 2 ~ * 

where a 12 is the sum of the quantities n 3 /xy for which 
both p-L and p 2 are involved, a ±1 the corresponding sum for 
all in which p x is involved, and b x the sum of the quanti- 
ties n 2 /y for which p x is involved. 

3. Practical Example 

In order to illustrate the practical application of this 
method to a complex case, we will consider the location 
of the 8 genes, from Eeduced to Eimmed, in the middle 
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of the sex chromosome of Drosophila willistoni. We have 
here 7 intervals to determine, and fifteen crossover per- 
centages are given [1]. Table II shows the data, and 
the series of weighting quantities derived from them. 



TABLE II 





Per- 














cent- 
age 


X 


n 


ti>lv 


n?lxy 


Unknowns Involved 


Reduced-Scute 


.95 


27 


2,848 


2,875.26 


303,287 


Pi 


Reduced-Rough. . . . 


6.24 


37 


593 


632.46 


10,136 


Pi, P2, P3, P» 


Scute-Peach 


1.81 


8 


442 


450.15 


24,871 


Pi 




1.43 


4 


279 


283.06 


19,742 


P2, P3 




7.09 


453 


6,388 


6,875.58 


96,956 


P2, P3, P« 


Scute-Deformed .... 


7.24 


50 


691 


744.90 


10,295 


P2, P3, Pi, Pi, pe 


Scute-Rimmed 


9.91 


189 


1,908 


2,117.78 


21,379 


Pi, Pi, Pi, Pi, pe, pj 


Peach-Beaded 


1.70 


3 


176 


179.05 


10,504 


PS 


Peach-Rough 


5.05 


33 


654 


688.75 


13,650 


Pa, Pi 




2.42 


11 


455 


466.27 


19,287 


Pi 


Rough-Triple 


.49 


4 


809 


813.02 


164,433 


Pi 


Rough-Deformed . . 


2.39 


12 


503 


515.29 


21,599 


Pi, p« 


Rough-Rimmed .... 


2.26 


62 


2,742 


2,805.43 


124,072 


Pi, PC P7 


Triple-Rimmed .... 


1.00 


6 


601 


607.06 


60,807 


Pe, Pi 


Deformed-Rimmed . 


4.17 


2 


48 


50.09 


1,202 


P7 



From this table we write down the normal equations 

313,423 Pi + 10,136(p 2 + p 3 + p 4 ) = 3,507.72 

10,136^ + 183,380p 2 + 158,509p 3 + l>38,766p 4 + 31,674p,. + ' 31,674p fl 

+ 21,379p 7 = 11,103.93 
10,136p i + 158,509p 2 -f 182,663© + 152,416p 4 + 31,674p B + 31,674p fl 

+ 21,3'79p 7 = 11,521.58 
10,136p 4-138,766p 2 -f 152,416p +171,703p 4 + 31,674p s + 31,674p 6 

+ 21,379p. = 11,525.74 
31,674 (p 2 + p ■+ p ) + 341,778p 5 + 177,345p 6 -f- 145,451p ? = 6,996.42 
31,674 (p +p ■+. p ) + 177,345p + 238,152p e + 206,258p 7 = 6,790.46 
21,379 (p 2 + p + p ) -f 145,451p 5 5 + 206,258p 6 + 217,460p 7 = 5,580.36 

Using a calculating machine, the work so far is rapid 
and mechanical; the solution of the normal equations 
may in this case be much simplified by observing the uni- 
formity of some of the sets of coefficients, a type of uni- 
formity which is probably characteristic of crossover 
data. Thus by considering (p 2 + p 3 + pi) as a single 
quantity, p-,. is immediately expressible in terms of it, and 
by solving the last three equations we may do the same 
for p 5 , fi and p 7 ; substituting finally in equations (2, 3, 
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4) we solve them for p 2 , p 3 and p if and obtain the values 
shown in Table III. 

The seven values obtained give mutually consistent 
values for the crossover percentages between the fifteen 
pairs tested, and are therefore suitable for the construc- 
tion of chromosome map. If the conditions of Maximum 
Likelihood had been exactly fulfilled they would agree 
better than any other consistent series of values with 
the percentages observed. As it is, it is only in the ab- 
errant value of p 7 that the assumption that the observed 
values are approximately correct breaks down, and it 
is probable that such cases will only occur when the data 
are admittedly insufficient. 

TABLE III 



Reduced-Scute . . . . 
Reduced-Rough . . . 

Scute-Peach 

Scute-Beaded 

Scute— Rough 

Scute-Deformed . . . 
Scute-Rimmed 

Peach-Beaded 

Peach-Rough 

Beaded-Rough. . . . 

Rough-Triple 

Rough-Deformed. . 
Rough-Rimmed. . . 
Triple-Rimmed . . . 
Deformed-Rimmed 



Calcu- 
lated 



90 pi 

66 

67 P2 

98 

76 

40 

.97 

.31 pa 

09 

78 Pi 

.69 Pi 

.64 

,21 

,52 

.57 Pi 



Observed 



.95 
6.24 
1.81 
1.43 
7.09 
7.24 
9.91 
1.70 
5.05 
2.42 

.49 
2.39 
2.26 
1.00 
4.17 



Difference 
d 



+ .05 
-1.42 
+ -14 
-1.53 
+ .33 
-1.16 
+ .94 

- .39 

- .04 
-1.36 

- .20 
+ .75 

- .05 

- .52 
+3.60 



Standard 
Error 



.18 

1.09 
.61 

1.02 
.31 

1.06 
.65 
.86 
.86 
,89 
.29 
.57 
.28 
.50 

1.09 



<P 



.08 
1.70 
.05 
2.31 
1.13 
1.20 
2.09 

.21 

.00 
2.34 

.48 
1.73 

.03 

1.08 

10.91 



X 2 = 25.34 



Table III is arranged to compare the differences be- 
tween the calculated and the observed percentages with 
the standard errors due to sampling ; except for p-i all the 
differences are less than twice their standard errors ; thus 
showing the general agreement between the data and the 
theory of linear arrangement of the genes. The fit, 
however, is not a close one, even if we omit p 7 ; in the 
present state of our knowledge this will not throw any 
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doubt on the scheme of linear arrangement, but will sug- 
gest that the crossover ratios in this part of the chromo- 
some were not constant in all the strains used to compile 
the data. 

In estimating the Goodness of Fit of data of this kind, 
x 2 may be calculated by summing the values of cP/v 2 , as 
in Table III Attention should, however, be called to 
the fact that it has been recently shown (3) that in enter- 
ing Elderton's Table we must put n' equal to one more 
than the number of degrees of freedom, remaining after 
we have fitted our unknowns to the data. In the present 
case we have found 7 unknowns from 15 equations, leav- 
ing 8 degrees of freedom, so that n' should be 9, and 
not 16. 

In conclusion it should be noted that to be available 
for the use of this process the crossover data should be 
stated in the form in which it is given by Lancefiftld and 
Metz, in which the crossovers tabled between any two 
genes do not include those experiments in which an inter- 
mediate gene was under observation. The practice of 
throwing together all the crossovers between two genes, 
in order to improve the ratios between the more distant 
points, causes the same crossover to appear repeatedly 
in different entries. The data are no longer the product 
of independent experiments, and must be re-summarized 
before reduction. 
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